Test-time adaptation (TTA) has attracted significant attention due to its practical properties which enable the adaptation of a pre-trained model to a new domain with only target dataset during the inference stage. Prior works on TTA assume that the target dataset comes from the same distribution and thus constitutes a single homogeneous domain. In practice, however, the target domain can contain multiple homogeneous domains which are sufficiently distinctive from each other and those multiple domains might occur cyclically. Our preliminary investigation shows that domain-specific TTA outperforms vanilla TTA treating compound domain (CD) as a single one. However, domain labels are not available for CD, which makes domain-specific TTA not practicable. To this end, we propose an online clustering algorithm for finding pseudo-domain labels to obtain similar benefits as domain-specific configuration and accumulating knowledge of cyclic domains effectively. Moreover, we observe that there is a significant discrepancy in terms of prediction quality among samples, especially in the CD context. This further motivates us to boost its performance with gradient denoising by considering the image-wise similarity with the source distribution. Overall, the key contribution of our work lies in proposing a highly significant new task compound domain test-time adaptation (CD-TTA) on semantic segmentation as well as providing a strong baseline to facilitate future works to benchmark.
translated by 谷歌翻译
蒙面的自动编码器是可扩展的视觉学习者,因为Mae \ Cite {He2022masked}的标题表明,视觉中的自我监督学习(SSL)可能会采用与NLP中类似的轨迹。具体而言,具有蒙版预测(例如BERT)的生成借口任务已成为NLP中的事实上的标准SSL实践。相比之下,他们的歧视性对应物(例如对比度学习)掩埋了视力中的生成方法的早期尝试;但是,蒙版图像建模的成功已恢复了屏蔽自动编码器(过去通常被称为DeNosing AutoCoder)。作为在NLP中与Bert弥合差距的一个里程碑,蒙面自动编码器吸引了对SSL在视觉及其他方面的前所未有的关注。这项工作对蒙面自动编码器进行了全面的调查,以洞悉SSL的有希望的方向。作为第一个使用蒙版自动编码器审查SSL的人,这项工作通过讨论其历史发展,最新进度以及对不同应用的影响,重点介绍其在视觉中的应用。
translated by 谷歌翻译
隐式3D表示的最新进展,即神经辐射场(NERFS),以可区分的方式使准确且具有逼真的3D重建成为可能。这种新的表示可以有效地以一种紧凑的格式传达数百个高分辨率图像的信息,并允许对新观点的逼真综合。在这项工作中,使用NERF的变体称为全体氧,我们为感知任务创建了第一个大规模隐式表示数据集,称为Fustection,该数据集由两个部分组成,这些部分既包含以对象为中心和场景为中心的扫描,用于分类和分段, 。它显示了原始数据集的显着内存压缩率(96.4 \%),同时以统一形式包含2D和3D信息。我们构建了直接作为输入这种隐式格式的分类和分割模型,并提出了一种新颖的增强技术,以避免在图像的背景上过度拟合。代码和数据可在https://postech-cvlab.github.io/perfception中公开获得。
translated by 谷歌翻译
最近的3D注册方法可以有效处理大规模或部分重叠的点对。然而,尽管具有实用性,但在空间尺度和密度方面与不平衡对匹配。我们提出了一种新颖的3D注册方法,称为uppnet,用于不平衡点对。我们提出了一个层次结构框架,通过逐渐减少搜索空间,可以有效地找到近距离的对应关系。我们的方法预测目标点的子区域可能与查询点重叠。以下超点匹配模块和细粒度的细化模块估计两个点云之间的准确对应关系。此外,我们应用几何约束来完善满足空间兼容性的对应关系。对应性预测是对端到端训练的,我们的方法可以通过单个前向通行率预测适当的刚体转换,并给定点云对。为了验证提出方法的疗效,我们通过增强Kitti LiDAR数据集创建Kitti-UPP数据集。该数据集的实验表明,所提出的方法显着优于最先进的成对点云注册方法,而当目标点云大约为10 $ \ times $ higation时,注册召回率的提高了78%。比查询点云大约比查询点云更密集。
translated by 谷歌翻译
对于许多3D视觉任务,包括对象检测,分割,注册和3D输入的各种感知任务,这一点是普遍的。然而,由于3D数据的稀疏性和不规则性,定制3D运算符或网络设计一直是3D研究的主要焦点,而参数的网络或参数的功效的大小被忽略了。在这项工作中,我们对空间稀疏3D卷积网络的重量稀疏性进行了第一综合研究,并提出了一种用于语义分割和实例分割的紧凑的权重稀疏和空间稀疏的3D Conver(WS ^ 3-Tromet)。我们采用各种网络修剪策略来查找紧凑的网络,并展示我们的WS ^ 3-TRMYNET在数值较少数量的参数(1/100压缩速率)中实现了最小的性能(2.15%掉落)。最后,我们系统地分析了WS ^ 3-Tromnet的压缩模式,并在我们的压缩网络中显示了有趣的新出现的稀疏模式,以进一步加速推断。
translated by 谷歌翻译
自动车辆(AVS)必须与异构地理区域的多种人类驱动因素互动。理想情况下,AVS的车队应该共享轨迹数据,以持续地从使用基于云的分布式学习的集体经验来重新列车和改进轨迹预测模型。与此同时,这些机器人应该理想地避免上传原始驱动程序交互数据,以保护专有政策(在与其他公司共享时的见解)或保护驾驶员隐私。联合学习(FL)是一种流行的机制,用于在不泄露私人本地数据的情况下从不同的用户学习来自不同用户的云服务器模型。然而,FL通常不是强大的 - 当用户数据来自高度异构的分布时,它会学习次优模型,这是人机交互的关键标志。在本文中,我们提出了一种小型变种的个性化FL,专门从事强大的机器人学习模型到不同的用户分布。我们的算法在实际用户研究中优于2倍的标准FL基准,我们进行了我们进行的人力操作车辆必须优雅地合并标准Carla和Carlo AV模拟器中的模拟AVS。
translated by 谷歌翻译
对于机器人来说,了解人类指令并在不久的将来执行有意义的任务,重要的是开发学习的模型,了解了参考语言,以识别现实世界3D场景中的共同对象。在本文中,我们介绍了一种用于3D视觉接地问题的空间语言模型。具体地,给定具有潜在对象候选的3D边界框的点云形式的重建的3D场景,以及参考场景中的目标对象的语言话语,我们的模型成功地将目标对象从一组潜在的候选者识别。具体而言,Languagrefer使用基于变压器的架构,该架构将空间嵌入与边界框中的空间嵌入与来自Distilbert的微调语言嵌入式的绑定框相结合,以预测目标对象。我们表明它竞争地表现在引用3D提出的Visio-linguistic数据集上。此外,我们分析其空间推理任务性能与感知噪声分离,视图依赖性话语的准确性,以及用于潜在机器人应用的观点注释。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译